Goto

Collaborating Authors

 fl algorithm



STEM: AStochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning

Neural Information Processing Systems

Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the number of local updates, so that the WNs use the minimum number of samples and communication rounds to achieve the desired solution. This work addresses the above question and considers a class of stochastic algorithms where the WNs perform a few local updates before communication. We show that when both the WN's and the server's directions are chosen based on certain stochastic momentum estimator, the algorithm requires O( 3/2) samples and O( 1) communication rounds to compute an -stationary solution. To the best of our knowledge, this is the first FL algorithm that achieves such near-optimal sample and communication complexities simultaneously. Further, we show that there is a trade-off curve between the number of local updates and the minibatch sizes, on which the above sample and communication complexities can be maintained. Finally, we show that for the classical FedAvg (a.k.a. Local SGD, which is a momentum-less special case of the STEM), a similar trade-off curve exists, albeit with worse sample and communication complexities. Our insights on this trade-off provides guidelines for choosing the four important design elements for FL algorithms, the number of local updates, WNs' and server's update directions, and minibatch sizes to achieve the best performance.


STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning

Neural Information Processing Systems

Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the local update frequency, so that the WNs use the minimum number of samples and communication rounds to achieve the desired solution. This work addresses the above question and considers a class of stochastic algorithms where the WNs perform a few local updates before communication. We show that when both the WN's and the server's directions are chosen based on certain stochastic momentum estimator, the algorithm requires $\tilde{\mathcal{O}}(\epsilon^{-3/2})$ samples and $\tilde{\mathcal{O}}(\epsilon^{-1})$ communication rounds to compute an $\epsilon$-stationary solution. To the best of our knowledge, this is the first FL algorithm that achieves such {\it near-optimal} sample and communication complexities simultaneously. Further, we show that there is a trade-off curve between local update frequencies and local minibatch sizes, on which the above sample and communication complexities can be maintained.







FedNAR: Federated Optimization with Normalized Annealing Regularization

Neural Information Processing Systems

Weight decay is a standard technique to improve generalization performance in modern deep neural network optimization, and is also widely adopted in federated learning (FL) to prevent overfitting in local clients. In this paper, we first explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. While preventing overfitting is crucial, weight decay can introduce a different optimization goal towards the global objective, which is further amplified in FL due to multiple local updates and heterogeneous data distribution.To address this challenge, we develop {\it Federated optimization with Normalized Annealing Regularization} (FedNAR), a simple yet effective and versatile algorithmic plug-in that can be seamlessly integrated into any existing FL algorithms. Essentially, we regulate the magnitude of each update by performing co-clipping of the gradient and weight decay.We provide a comprehensive theoretical analysis of FedNAR's convergence rate and conduct extensive experiments on both vision and language datasets with different backbone federated optimization algorithms. Our experimental results consistently demonstrate that incorporating FedNAR into existing FL algorithms leads to accelerated convergence and heightened model accuracy. Moreover, FedNAR exhibits resilience in the face of various hyperparameter configurations. Specifically, FedNAR has the ability to self-adjust the weight decay when the initial specification is not optimal, while the accuracy of traditional FL algorithms would markedly decline.


FedDSR: Federated Deep Supervision and Regularization Towards Autonomous Driving

arXiv.org Artificial Intelligence

Federated Learning (FL) enables collaborative training of autonomous driving (AD) models across distributed vehicles while preserving data privacy. However, FL encounters critical challenges such as poor generalization and slow convergence due to non-independent and identically distributed (non-IID) data from diverse driving environments. To overcome these obstacles, we introduce Federated Deep Supervision and Regularization (FedDSR), a paradigm that incorporates multi-access intermediate layer supervision and regularization within federated AD system. Specifically, FedDSR comprises following integral strategies: (I) to select multiple intermediate layers based on predefined architecture-agnostic standards. (II) to compute mutual information (MI) and negative entropy (NE) on those selected layers to serve as intermediate loss and regularizer. These terms are integrated into the output-layer loss to form a unified optimization objective, enabling comprehensive optimization across the network hierarchy. (III) to aggregate models from vehicles trained based on aforementioned rules of (I) and (II) to generate the global model on central server. By guiding and penalizing the learning of feature representations at intermediate stages, FedDSR enhances the model generalization and accelerates model convergence for federated AD. We then take the semantic segmentation task as an example to assess FedDSR and apply FedDSR to multiple model architectures and FL algorithms. Extensive experiments demonstrate that FedDSR achieves up to 8.93% improvement in mIoU and 28.57% reduction in training rounds, compared to other FL baselines, making it highly suitable for practical deployment in federated AD ecosystems.